Filtration of String Proximity Search via Transformation

نویسندگان

  • S. Alireza Aghili
  • Divyakant Agrawal
  • Amr El Abbadi
چکیده

The problem of proximity search in biological databases is addressed. We study vectortransformations and conduct the application of DFT(Discrete Fourier Transformation) andDWT(Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNAsequence proximity search to reduce the search time of range queries. Our empirical results on anumber of Prokaryote and Eukaryote DNA contig databases demonstrate up to 50-fold filtrationratio of the search space, up to 13 times faster filtration. The proposed transformation techniquesmay easily be integrated as a preprocessing phase on top of the current existing similarity searchheuristics such as BLAST[1], PattenHunter[3], FastA[4], QUASAR[2] and to efficiently prunenon-relevant sequences. We study the precision of applying dimensionality reduction techniquesfor faster and more efficient range query searches, and discuss the imposed trade-offs. References[1] S. Altschul, W. Gish, W. Miller, E. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol.,215:403–410, 1990.[2] S. Burkhardt, A. Crauser, P. Ferragina, H.P. Lenhof, E. Rivals, and M. Vingron. q-gram based database searchingusing a suffix array (quasar). In RECOMB, pages 77–83, 1999.[3] B. Ma, J. Tromp, and M. Li. Patternhunter: faster and more sensitive homology search. Bioinformatics, 18(3):440–445, March 2002.[4] W. R. Pearson. Using the fasta program to search protein and dna sequence databases. Methods Mol Biol,25:365–389, 1994.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Transformation Techniques Towards Efficient Filtration of String Proximity Search of Biological Sequences

The problem of proximity search in biological databases is addressed. We study vector transformations and conduct the application of DFT(Discrete Fourier Transformation) and DWT(Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNA sequence proximity search to reduce the search time of range queries. Our empirical results on a number of Prokaryote and Eukaryote DNA ...

متن کامل

BFT: A Relational-based Bit Filtration Technique for Efficient Approximate String Joins in Biological Databases

Joining massive tables in relational databases have received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of pairwise whole genome comparison into an approximate join operation in the wellestablished relational database context. We propose a ...

متن کامل

BFT: Bit Filtration Technique for Approximate String Join in Biological Databases

Joining massive tables in relational databases have received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of pairwise whole-genome comparison into an approximate join operation in the wellestablished relational database context. We propose a ...

متن کامل

Application of grey GIS filtration to identify the potential area for cement plants in South Khorasan Province, Eastern Iran

Cement-based materials are fundamental resources used to in construction. The increase in requests for and consumption of cement products, especially in Iran, indicates that more cement plants should be equipped. This study developed a geographical information system using pairwise comparison based on grey numbers to identify potential sites in which to set up cement plants. A group of five exp...

متن کامل

Distance Based Indexing for String Proximity Search

In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003